Skip to content

obj: recalculate curr_allocated on underflow#37

Merged
osalyk merged 11 commits intostable-2.1from
liang/pmdk/pmdk_fix_curr_allocated
Apr 24, 2026
Merged

obj: recalculate curr_allocated on underflow#37
osalyk merged 11 commits intostable-2.1from
liang/pmdk/pmdk_fix_curr_allocated

Conversation

@gnailzenh
Copy link
Copy Markdown

@gnailzenh gnailzenh commented Apr 23, 2026

heap_curr_allocated although is stored persistently it is not updated transactionally nor reliably persisted. This means e.g. that in case a transaction succeeds but the process will get terminated before the update of heap_curr_allocated, the value of heap_curr_allocated will get out of sync with the actual heap state.

The most obvious case of this happening is when heap_curr_allocated is actually smaller than the sum of the sizes of all allocations in the heap, so in case all of the allocations are freed, heap_curr_allocated will underflow and get a very big value, bigger than heap size.

Ref: https://daosio.atlassian.net/browse/DAOS-18882

This workaround detects this most obvious case and recalculates.


This change is Reviewable

@janekmi janekmi changed the base branch from release-2.1.3 to stable-2.1 April 23, 2026 18:18
gnailzenh and others added 2 commits April 23, 2026 18:21
rebuild heap_curr_allocated from on-media state if the counter has
underflowed.

Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
@janekmi janekmi force-pushed the liang/pmdk/pmdk_fix_curr_allocated branch from f1aa619 to 483b054 Compare April 23, 2026 20:12
janekmi added 4 commits April 23, 2026 20:37
Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
@janekmi janekmi requested review from grom72 and osalyk April 23, 2026 22:24
@janekmi janekmi changed the title DAOS-18882 pmdk: rebuild heap_curr_allocated from on-media state obj: recalculate curr_allocated on underflow Apr 23, 2026
@NiuYawei NiuYawei requested a review from sherintg April 24, 2026 03:57
Copy link
Copy Markdown
Contributor

@grom72 grom72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grom72 reviewed 10 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on gnailzenh, osalyk, and sherintg).


src/test/obj_ctl_stats_curr_allocated_wa/obj_ctl_stats_curr_allocated_wa.c line 38 at r1 (raw file):

	ret = pmemobj_ctl_get(pop, "stats.heap.curr_allocated", &allocated);
	UT_ASSERTeq(ret, 0);
	UT_ASSERTne(allocated, 0);

We know expected value as we check it in log match.

Suggestion:

	ret = pmemobj_ctl_get(pop, "stats.heap.curr_allocated", &allocated);
	UT_ASSERTeq(ret, 0);
	UT_ASSERTeq(allocated, 0);
	
	PMEMoid oid;
	ret = pmemobj_alloc(pop, &oid, 1, 0, NULL, NULL);
	UT_ASSERTeq(ret, 0);

	ret = pmemobj_ctl_get(pop, "stats.heap.curr_allocated", &allocated);
	UT_ASSERTeq(ret, 0);
	UT_ASSERTeq(allocated, 128);

Copy link
Copy Markdown

@NiuYawei NiuYawei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

grom72 added a commit to daos-stack/daos that referenced this pull request Apr 24, 2026
Update PMDK to incorporate the following fixes:
- fix "The pool was not closed" message (no ADR failure) daos-stack/pmdk#36
- recalculate curr_allocated on underflow daos-stack/pmdk#37

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>

Priority: 2

Allow-unstable-test: true

Focus validation on PMem version

Skip-func-hw-test-medium: false
Skip-func-hw-test-medium-md-on-ssd: true
Skip-func-hw-test-medium-vmd: false
Skip-func-hw-test-medium-verbs-provider: false
Skip-func-hw-test-medium-verbs-provider-md-on-ssd: true
Skip-func-hw-test-large: false
Skip-func-hw-test-large-md-on-ssd: true
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
grom72 added a commit to daos-stack/daos that referenced this pull request Apr 24, 2026
Update PMDK to incorporate the following fixes:
- fix "The pool was not closed" message (no ADR failure) daos-stack/pmdk#36
- recalculate curr_allocated on underflow daos-stack/pmdk#37

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>

Priority: 2

Allow-unstable-test: true

Focus validation on PMem version

Skip-func-hw-test-medium: false
Skip-func-hw-test-medium-md-on-ssd: true
Skip-func-hw-test-medium-vmd: false
Skip-func-hw-test-medium-verbs-provider: false
Skip-func-hw-test-medium-verbs-provider-md-on-ssd: true
Skip-func-hw-test-large: false
Skip-func-hw-test-large-md-on-ssd: true
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
janekmi and others added 2 commits April 24, 2026 10:51
... move recalculation to the getter.

Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
Co-authored-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>
Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
grom72 added a commit to daos-stack/daos that referenced this pull request Apr 24, 2026
Update PMDK to incorporate the following fixes:
- fix "The pool was not closed" message (no ADR failure) daos-stack/pmdk#36
- recalculate curr_allocated on underflow daos-stack/pmdk#37

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@hpe.com>

Priority: 2

Allow-unstable-test: true

Skip-func-hw-test-medium: false
Skip-func-hw-test-large: false
Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
Copy link
Copy Markdown
Contributor

@grom72 grom72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grom72 reviewed 5 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on gnailzenh, osalyk, and sherintg).


src/libpmemobj/stats.c line 64 at r2 (raw file):

	 */
	if (*argv > *pop->heap.sizep) { /* covers the == UINT64_MAX case */
		/* if the value is broken, recalculate it */

CORE_LOG_WARNING is missing.

Copy link
Copy Markdown
Contributor

@osalyk osalyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@osalyk reviewed 9 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on gnailzenh, grom72, and sherintg).


src/libpmemobj/stats.c line 64 at r2 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

CORE_LOG_WARNING is missing.

👍

janekmi added 2 commits April 24, 2026 13:23
Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
Signed-off-by: Jan Michalski <jan-marian.michalski@hpe.com>
Copy link
Copy Markdown
Contributor

@janekmi janekmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janekmi made 2 comments.
Reviewable status: 6 of 13 files reviewed, 2 unresolved discussions (waiting on grom72, osalyk, and sherintg).


src/libpmemobj/stats.c line 64 at r2 (raw file):

Previously, osalyk (Oksana Sałyk) wrote…

👍

Done.


src/test/obj_ctl_stats_curr_allocated_wa/obj_ctl_stats_curr_allocated_wa.c line 38 at r1 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

We know expected value as we check it in log match.

Please see the update version.

Copy link
Copy Markdown
Contributor

@osalyk osalyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@osalyk reviewed 7 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on grom72 and sherintg).

@osalyk osalyk merged commit 69925cf into stable-2.1 Apr 24, 2026
8 of 9 checks passed
osalyk added a commit to daos-stack/daos that referenced this pull request Apr 24, 2026
Signed-off-by: Oksana Salyk <oksana.salyk@hpe.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants